AITopics

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Velez-Ginorio, Joey, Amin, Nada, Kording, Konrad, Zdancewic, Steve

Compiling to recurrent neurons

arXiv.org Artificial IntelligenceNov-20-2025

Discrete structures are currently second-class in differentiable programming. Since functions over discrete structures lack overt derivatives, differentiable programs do not differentiate through them and limit where they can be used. For example, when programming a neural network, conditionals and iteration cannot be used everywhere; they can break the derivatives necessary for gradient-based learning to work. This limits the class of differentiable algorithms we can directly express, imposing restraints on how we build neural networks and differentiable programs more generally. However, these restraints are not fundamental. Recent work shows conditionals can be first-class, by compiling them into differentiable form as linear neurons. Similarly, this work shows iteration can be first-class -- by compiling to linear recurrent neurons. We present a minimal typed, higher-order and linear programming language with iteration called $\textsf{Cajal}\scriptstyle(\mathbb{\multimap}, \mathbb{2}, \mathbb{N})$. We prove its programs compile correctly to recurrent neurons, allowing discrete algorithms to be expressed in a differentiable form compatible with gradient-based learning. With our implementation, we conduct two experiments where we link these recurrent neurons against a neural network solving an iterative image transformation task. This determines part of its function prior to learning. As a result, the network learns faster and with greater data-efficiency relative to a neural network programmed without first-class iteration. A key lesson is that recurrent neurons enable a rich interplay between learning and the discrete structures of ordinary programming.

artificial intelligence, machine learning, neural network, (16 more...)

2511.14953

Country: North America > United States (0.15)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Velez-Ginorio, Joey, Amin, Nada, Kording, Konrad, Zdancewic, Steve

Compiling to linear neurons

arXiv.org Artificial IntelligenceNov-19-2025

We don't program neural networks directly. Instead, we rely on an indirect style where learning algorithms, like gradient descent, determine a neural network's function by learning from data. This indirect style is often a virtue; it empowers us to solve problems that were previously impossible. But it lacks discrete structure. We can't compile most algorithms into a neural network -- even if these algorithms could help the network learn. This limitation occurs because discrete algorithms are not obviously differentiable, making them incompatible with the gradient-based learning algorithms that determine a neural network's function. To address this, we introduce $\textsf{Cajal}$: a typed, higher-order and linear programming language intended to be a minimal vehicle for exploring a direct style of programming neural networks. We prove $\textsf{Cajal}$ programs compile to linear neurons, allowing discrete algorithms to be expressed in a differentiable form compatible with gradient-based learning. With our implementation of $\textsf{Cajal}$, we conduct several experiments where we link these linear neurons against other neural networks to determine part of their function prior to learning. Linking with these neurons allows networks to learn faster, with greater data-efficiency, and in a way that's easier to debug. A key lesson is that linear programming languages provide a path towards directly programming neural networks, enabling a rich interplay between learning and the discrete structures of ordinary programming.

artificial intelligence, machine learning, neural network, (14 more...)

2511.13769

Country: North America > United States (0.46)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Hajij, Mustafa, Bastian, Lennart, Osentoski, Sarah, Kabaria, Hardik, Davenport, John L., Dawood, Sheik, Cherukuri, Balaji, Kocheemoolayil, Joseph G., Shahmansouri, Nastaran, Lew, Adrian, Papamarkou, Theodore, Birdal, Tolga

Copresheaf Topological Neural Networks: A Generalized Deep Learning Framework

arXiv.org Artificial IntelligenceOct-28-2025

We introduce copresheaf topological neural networks (CTNNs), a powerful unifying framework that encapsulates a wide spectrum of deep learning architectures, designed to operate on structured data, including images, point clouds, graphs, meshes, and topological manifolds. While deep learning has profoundly impacted domains ranging from digital assistants to autonomous systems, the principled design of neural architectures tailored to specific tasks and data types remains one of the field's most persistent open challenges. CTNNs address this gap by formulating model design in the language of copresheaves, a concept from algebraic topology that generalizes most practical deep learning models in use today. This abstract yet constructive formulation yields a rich design space from which theoretically sound and practically effective solutions can be derived to tackle core challenges in representation learning, such as long-range dependencies, oversmoothing, heterophily, and non-Euclidean domains. Our empirical results on structured data benchmarks demonstrate that CTNNs consistently outperform conventional baselines, particularly in tasks requiring hierarchical or localized sensitivity. These results establish CTNNs as a principled multi-scale foundation for the next generation of deep learning architectures.

artificial intelligence, machine learning, neural network, (17 more...)

2505.21251

Genre:

Research Report (1.00)
Overview (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-15-2025, 11:29:25 GMT

9efb1a59d7b58e69996cf0e32cb71098-Supplemental.pdf

equivariant map, point cloud, wreath product, (13 more...)

Country:

North America > United States > Massachusetts (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Sensing and Signal Processing (0.68)

Finkelshtein, Ben, Ceylan, İsmail İlkan, Bronstein, Michael, Levie, Ron

Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models

arXiv.org Machine LearningJul-1-2025

Graph machine learning architectures are typically tailored to specific tasks on specific datasets, which hinders their broader applicability. This has led to a new quest in graph machine learning: how to build graph foundation models capable of generalizing across arbitrary graphs and features? In this work, we present a recipe for designing graph foundation models for node-level tasks from first principles. The key ingredient underpinning our study is a systematic investigation of the symmetries that a graph foundation model must respect. In a nutshell, we argue that label permutation-equivariance alongside feature permutation-invariance are necessary in addition to the common node permutation-equivariance on each local neighborhood of the graph. To this end, we first characterize the space of linear transformations that are equivariant to permutations of nodes and labels, and invariant to permutations of features. We then prove that the resulting network is a universal approximator on multisets that respect the aforementioned symmetries. Our recipe uses such layers on the multiset of features induced by the local neighborhood of the graph to obtain a class of graph foundation models for node property prediction. We validate our approach through extensive experiments on 29 real-world node classification datasets, demonstrating both strong zero-shot empirical performance and consistent improvement as the number of training graphs increases.

machine learning, natural language, polynomial, (17 more...)

arXiv.org Machine Learning

2506.14291

Country:

North America > United States > Texas (0.04)
South America > Brazil (0.04)
North America > United States > Wisconsin (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Silwal, Sandeep, Woodruff, David P., Zhang, Qiuyi

Beyond Worst-Case Dimensionality Reduction for Sparse Vectors

arXiv.org Artificial IntelligenceFeb-27-2025

We study beyond worst-case dimensionality reduction for $s$-sparse vectors. Our work is divided into two parts, each focusing on a different facet of beyond worst-case analysis: We first consider average-case guarantees. A folklore upper bound based on the birthday-paradox states: For any collection $X$ of $s$-sparse vectors in $\mathbb{R}^d$, there exists a linear map to $\mathbb{R}^{O(s^2)}$ which \emph{exactly} preserves the norm of $99\%$ of the vectors in $X$ in any $\ell_p$ norm (as opposed to the usual setting where guarantees hold for all vectors). We give lower bounds showing that this is indeed optimal in many settings: any oblivious linear map satisfying similar average-case guarantees must map to $\Omega(s^2)$ dimensions. The same lower bound also holds for a wide class of smooth maps, including `encoder-decoder schemes', where we compare the norm of the original vector to that of a smooth function of the embedding. These lower bounds reveal a separation result, as an upper bound of $O(s \log(d))$ is possible if we instead use arbitrary (possibly non-smooth) functions, e.g., via compressed sensing algorithms. Given these lower bounds, we specialize to sparse \emph{non-negative} vectors. For a dataset $X$ of non-negative $s$-sparse vectors and any $p \ge 1$, we can non-linearly embed $X$ to $O(s\log(|X|s)/\epsilon^2)$ dimensions while preserving all pairwise distances in $\ell_p$ norm up to $1\pm \epsilon$, with no dependence on $p$. Surprisingly, the non-negativity assumption enables much smaller embeddings than arbitrary sparse vectors, where the best known bounds suffer exponential dependence. Our map also guarantees \emph{exact} dimensionality reduction for $\ell_{\infty}$ by embedding into $O(s\log |X|)$ dimensions, which is tight. We show that both the non-linearity of $f$ and the non-negativity of $X$ are necessary, and provide downstream algorithmic improvements.

dimension, sparse vector, vector, (15 more...)

2502.19865

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > Virginia > Alexandria County > Alexandria (0.04)
(8 more...)

Genre: Research Report > New Finding (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.82)

Neural Information Processing SystemsJan-22-2025, 14:52:51 GMT

Review for NeurIPS paper: Coupling-based Invertible Neural Networks Are Universal Diffeomorphism Approximators

Additional Feedback: [POST REBUTTAL] --------- I thank the authors for the detailed response. I guess I see how one can parameterize some (Real NVP-style) linear couplings combined with permutation and sign flipping to represent any regular matrices (regular here denotes invertible I suppose?). Perhaps this could be explicitly constructed in the paper to complement the results. Does it also imply the the general linear group in the main result can be replaced with permutation group sign flipping? Furthermore, if someone is only concerned with diffeomorphisms with a jacobian having strictly positive eigenvalues, then can the sign flipping be dropped?

coupling-based invertible neural network, linear map, universal diffeomorphism approximator, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Neural Information Processing SystemsJan-15-2025, 11:43:06 GMT

Bayesian Manifold Learning: The Locally Linear Latent Variable Model (LL-LVM)

We introduce the Locally Linear Latent Variable Model (LL-LVM), a probabilistic model for non-linear manifold discovery that describes a joint distribution over observations, their manifold coordinates and locally linear maps conditioned on a set of neighbourhood relationships. The model allows straightforward variational optimisation of the posterior distribution on coordinates and locally linear maps from the latent space to the observation space given the data. Thus, the LL-LVM encapsulates the local-geometry preserving intuitions that underlie non-probabilistic methods such as locally linear embedding (LLE). Its probabilistic semantics make it easy to evaluate the quality of hypothesised neighbourhood relationships, select the intrinsic dimensionality of the manifold, construct out-of-sample extensions and to combine the manifold model with additional probabilistic models that capture the structure of coordinates within the manifold.

bayesian manifold learning, linear latent variable model, ll-lvm, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.66)

arXiv.org Artificial IntelligenceJan-14-2025

Self-Attention as a Parametric Endofunctor: A Categorical Framework for Transformer Architectures

O'Neill, Charles

Self-attention mechanisms have revolutionised deep learning architectures, yet their core mathematical structures remain incompletely understood. In this work, we develop a category-theoretic framework focusing on the linear components of self-attention. Specifically, we show that the query, key, and value maps naturally define a parametric 1-morphism in the 2-category $\mathbf{Para(Vect)}$. On the underlying 1-category $\mathbf{Vect}$, these maps induce an endofunctor whose iterated composition precisely models multi-layer attention. We further prove that stacking multiple self-attention layers corresponds to constructing the free monad on this endofunctor. For positional encodings, we demonstrate that strictly additive embeddings correspond to monoid actions in an affine sense, while standard sinusoidal encodings, though not additive, retain a universal property among injective (faithful) position-preserving maps. We also establish that the linear portions of self-attention exhibit natural equivariance to permutations of input tokens, and show how the "circuits" identified in mechanistic interpretability can be interpreted as compositions of parametric 1-morphisms. This categorical perspective unifies geometric, algebraic, and interpretability-based approaches to transformer analysis, making explicit the underlying structures of attention. We restrict to linear maps throughout, deferring the treatment of nonlinearities such as softmax and layer normalisation, which require more advanced categorical constructions. Our results build on and extend recent work on category-theoretic foundations for deep learning, offering deeper insights into the algebraic structure of attention mechanisms.

category, morphism, vect, (17 more...)

2501.02931

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)